A Framework for Integrating Heterogeneous Sporadic Knowledge Sources into Automatic Speech Recognition
نویسندگان
چکیده
Heterogeneous knowledge sources that model speech only at certain time frames are difficult to incorporate into speech recognition, given standard multimodal fusion techniques. In this work, we present a new framework for the integration of this sporadic knowledge into standard HMM-based ASR. In a first step, each knowledge source is mapped onto a logarithmic score by using a sigmoid transfer function. Theses scores are then combined with the standard acoustic models by weighted linear combination. Speech recognition experiments with broad phonetic knowledge sources on a broadcast news transcription task show improved recognition results, given knowledge that provides complementary information for the ASR system.
منابع مشابه
From decoding-driven to detection-based paradigms for automatic speech recognition
We present a detection-based automatic speech recognition (ASR) paradigm that is capable of integrating both the knowledge sources accumulated in the speech science community and the modeling techniques established in the speech processing community. By exploring this new framework, we expect that researchers in the Interspeech community can collaboratively contribute to developing next generat...
متن کاملIntegrating Multi-level Linguistic Knowledge with a Unified Framework for Mandarin Speech Recognition
To improve the Mandarin large vocabulary continuous speech recognition (LVCSR), a unified framework based approach is introduced to exploit multi-level linguistic knowledge. In this framework, each knowledge source is represented by a Weighted Finite State Transducer (WFST), and then they are combined to obtain a so-called analyzer for integrating multi-level knowledge sources. Due to the unifo...
متن کاملTowards a Unified Framework
Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface, speech recognition as the front-end of speech understanding remains to be one of the fundam...
متن کاملA continuous speech recognition system integrating additional acoustic knowledge sources in a data-driven beam search algorithm
The paper presents a continuous speech recognition system which integrates an additional acoustic knowledge source into the data-driven beam search algorithm. Details of the object oriented implementation of the beam search algorithm will be given. Integration of additional knowledge sources is treated within the flexible framework of Dempster-Shafer theory. As a first example, a rule-based plo...
متن کاملTowards a unified framework for sub-lexical and supra-lexical linguistic modeling
Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface, speech recognition as the front-end of speech understanding remains to be one of the fundam...
متن کامل